Creating Interactive Graphs for the Web
Usefulness in a data science workflow
Exploratory work:
Expository work:
Share-able, portable, composable (i.e., reports, dashboards, etc)
Grammar of graphics
- Leland Wilkinson (1980): A grammar of graphics is a framework which follows a layered approach to describe and construct visualizations or graphics in a structured manner.
Many libraries abide by this grammar: ggplot2, plotly, D3 and many others.
Grammar of graphics and ggplot2
The gg in ggplot2 stands for Grammar of Graphics.
“The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.”
Advantages
- Functional data visualization
- Clean and wrangle data
- Map data to visual elements
- Tweak scales, guides, axis, labels and theme
- Ease of iteration
- Ease in maintaining consistency
Prepare your system
Update R and RStudio to the latest, released version: https://www.rstudio.com/products/rstudio/download/
Download and load the following packages:
install.packages(c("devtools", "knitr",
"usethis", "tidyverse",
"plotly", "r2d3",
"crosstalk", "shiny",
"flexdashboard"))Data source: espnscraperR, which collects or scrapes QBR, NFL standings, and stats from ESPN
library(tidyverse)
library(espnscrapeR)
nfl_qbr <-
crossing(season = 2020, week = 1:6) %>%
pmap_dfr(espnscrapeR::get_nfl_qbr)Consider working in the preview version of RStudio as well for latest features (that are more stable than the daily build): https://www.rstudio.com/products/rstudio/download/preview/
Plotly
Plotly R package allows you to create a variety of interactive graphics
Two ways:
- transforming a ggplot2 object into a plotly object via
ggplotly() - directly initializing a plotly object with
plot_ly(),plot_geo()orplot_mapbox()
- transforming a ggplot2 object into a plotly object via
There are strengths, weaknesses to either approach
Learning both will pay dividends, as they share a common grammar and can be reused
Intro to ggplotly()
library(plotly)
nfl_qbr_plot <- nfl_qbr %>%
ggplot(aes(x = total_epa, y = qbr_total, label = short_name)) +
geom_point() +
geom_smooth(method = "lm") +
geom_label() +
theme_minimal() +
labs(
x = "EPA", y = "QBR",
title = "EPA is correlated with QBR"
)
nfl_qbr_plotIntro to plot_ly()
Any plot made with
plot_ly()uses the JavaScript library plotly.jsplot_ly()interfaced directly with plotly.jsplot_lyhas arguments that fit into the “Grammar of Graphics”:x,ycolor,stroke,span,symbol,linetype
There is a family of
add_*functions:add_lines()add_barsadd_histogram2d()add_contouradd_boxplot- …and many more!
The plotly package takes a pure functional approach to a layered grammar of graphics, meaning (almost) every function anticipates a plotly object as input to it’s first argument and returns a modified version of that plotly object.
Example: the layout() function anticipates a plotly object in it’s first argument and its other arguments add and/or modify various layout components of that object (e.g., the title):
layout(
plot_ly(nfl_qbr %>% filter(game_week == 3), x = ~short_name, y = ~total_epa, color = ~rank),
title = "QB Total EPA for Week 3"
)Notice that data manipulation verbs from the dplyr package (such as filter()) can be used to transform the data underlying a plotly object!
Or, in a more cleaner fashion:
nfl_qbr %>%
filter(game_week == 3) %>%
plot_ly(x = ~short_name, y = ~total_epa, color = ~rank) %>%
add_bars() You can also add a layer of text using the summarized counts. Note that the global x mapping, as well as the other mappings local to this text layer (text and y), reflect data values from step 3:
nfl_qbr %>%
filter(game_week == 3) %>%
plot_ly(x = ~short_name, y = ~total_epa, color = ~rank) %>%
add_bars() %>%
add_text(
text = ~team,
textposition = "top middle",
cliponaxis = F
)To recap:
- Globally assign
short_name,total_epa,ranktox,yandcolor, respectively - Add a bars layer (which inherits the y from
plot_ly) - Use dplyr verbs to modify the data underlying the plotly object
- Add additional layers (like a layer of text)
- Share
D3 and Leaflet
D3 and Leaflet are also two popular JavaScript libraries whose R interfaces is based on the grammar of graphics. Like plotly, D3 and Leaflet can be used in a data pipeline with other R tools to clean, wrangle, transform and model with consistency and ease of iteration.
D3
The amazing dreamRs team is behind the {r2d3maps} package which allows you to make D3 maps in a flash. Check out their documentation to get a taste of what you can make: https://github.com/dreamRs/r2d3maps.
# Packages ----------------------------------------------------------------
library(janitor)
library(albersusa)
library(sf)
library(tigris)
library(r2d3maps)
library(rmapshaper)
# County geometries -------------------------------------------------------
cty_sf <- counties_sf("longlat") # from albersusa package
plot(st_geometry(cty_sf)) # checking geometries, looks goodcty_sf <- ms_simplify(cty_sf) # make the geometries simpler for faster rendering
# Data ---------------------------------------------------------------------
# CDC's social vulnerability index variables: https://svi.cdc.gov/Documents/Data/2018_SVI_Data/SVI2018Documentation.pdf
svi <- read_csv("/Users/atoumi/Desktop/interactive_graphics/svi_county.csv") %>%
clean_names() %>% # make everything snake_case
filter(across(where(is.numeric), ~. >= 0)) # because NA's are coded as -Inf
# join data to geometries
cty_sf_joined <- cty_sf %>% geo_join(svi, by_sp = "fips", by_df = "fips")
# make d3 map
d3_map(shape = cty_sf_joined, projection = "Albers") %>%
add_labs(caption = "Viz: Asmae Toumi | Data: CDC") %>%
add_continuous_breaks(var = "rpl_theme1", palette = "Reds") %>%
add_legend(title = "Socioeconomic Vulnerability (percentile)") %>%
add_tooltip("<b>{location}</b>: {rpl_theme1}")Leaflet
Like plotly, leaflet has an R interface. Unlike D3, Leaflet is more suited for maps. Leaflet is well-suited for the web, highly customizable and integrates well with R’s ecosystem.
##
|
| | 0%
|
|= | 2%
|
|=== | 4%
|
|==== | 6%
|
|===== | 7%
|
|====== | 9%
|
|======== | 11%
|
|======== | 12%
|
|========== | 14%
|
|========== | 15%
|
|============ | 17%
|
|============= | 19%
|
|============== | 19%
|
|=============== | 21%
|
|================ | 22%
|
|================= | 24%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 29%
|
|===================== | 30%
|
|====================== | 32%
|
|======================= | 34%
|
|======================== | 34%
|
|========================= | 36%
|
|========================== | 37%
|
|=========================== | 39%
|
|============================= | 41%
|
|============================= | 42%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================= | 47%
|
|================================== | 48%
|
|=================================== | 49%
|
|==================================== | 51%
|
|===================================== | 52%
|
|====================================== | 54%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 59%
|
|========================================== | 60%
|
|=========================================== | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|================================================ | 69%
|
|================================================== | 71%
|
|================================================== | 72%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|====================================================== | 77%
|
|======================================================= | 78%
|
|======================================================== | 79%
|
|========================================================= | 81%
|
|========================================================== | 82%
|
|=========================================================== | 84%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|===================================================================== | 99%
|
|======================================================================| 100%
library(tidycensus)
us_county_income <-
get_acs(geography = "county", variables = "B19013_001") %>%
select(fips = GEOID, name = NAME, income = "estimate") %>%
drop_na()
cty <- cty %>%
geo_join(us_county_income, by_df = "fips", by_sp = "fips")
library(leaflet)
library(RColorBrewer)
library(htmltools)
pal <- colorNumeric("YlOrRd", domain = cty$income)
labels <-
sprintf(
"<strong>%s</strong><br/> Income: %s $",
cty$name, cty$income) %>%
lapply(htmltools::HTML)
leaflet(cty) %>%
setView(-96, 37.8, 4) %>%
addProviderTiles("CartoDB.PositronNoLabels") %>%
addPolygons(
fillColor = ~pal(income),
weight = 1,
opacity = 1,
color = "white",
fillOpacity = 0.7,
highlight = highlightOptions(
weight = 2,
color = "#666",
fillOpacity = 0.7,
bringToFront = TRUE),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal"),
textsize = "15px",
direction = "auto")) %>%
addLegend(pal = pal,
values = cty$income,
position = "bottomright",
title = "Income (5-year ACS)",
labFormat = labelFormat(suffix = "$"),
opacity = 0.8,
na.label = "No data")See leaflet.R for standalone code.
Lessons learned
- The amount of interactive techniques is overwhelming
- Focus should be on identifying an analysis task/question first
- There will be different ways of doing the same thing, pick the most sensible to you
Ideally, according to Carson Sievert:
Go for R packages/technologies for creating interactive web graphics which:
- Don’t require knowledge of web technologies (start-up cost)
- Produce standalone HTML whenever possible (hosting/maintenance cost)
- Work well with other “tidy” tools in R (iteration cost)
- Link to external vis libraries (startover cost)
- Easy to use interactive techniques that support data analysis tasks (discovery cost)
References
Interactive web-based data visualization with R, plotly, and shiny by Carson Sievert: https://plotly-r.com/index.html
A Gentle Guide to the Grammar of Graphics with ggplot2 by Garrick Aden-Buie*: https://pkg.garrickadenbuie.com/trug-ggplot2/#1
espnscrapeR package by Thomas Mock: https://jthomasmock.github.io/espnscrapeR/index.html
My Talk on Grammar of Graphics: The Secret Sauce of Powerful Data Stories by Ganes Kesari: https://medium.com/@kesari/my-talk-on-grammar-of-graphics-the-secret-sauce-of-powerful-data-stories-3da618cf1bbf